Welcome to PixieDust

This notebook features an introduction to PixieDust, the Python library that makes data visualization easy.

Get started

This notebook is pretty simple and self-explanatory, but it wouldn't hurt to load up the PixieDust documentation so you have it.

New to notebooks? Don't worry, all you need to know to use this notebook is that to run code cells, put your cursor in the cell and press Shift + Enter.



In [1]:

    
# Make sure you have the latest version of PixieDust installed on your system
# Only run this cell if you did _not_ install PixieDust from source
# To confirm you have the latest, uncomment the next line and run this cell
#!pip install --user --upgrade pixiedust

Now that you have PixieDust installed and up-to-date on your system, you need to import it into this notebook. This is the last dependency before you can play with PixieDust.



In [1]:

    
# Run this cell
import pixiedust









    



Pixiedust database opened successfully






    





        
            
                
            
            Pixiedust version 1.0.4
        
        






    



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-c4bb40302b4e> in <module>()
      1 # Run this cell
----> 2 import pixiedust

/Users/mbrobergus.ibm.com/Documents/pixiedust/pixiedust/__init__.py in <module>()
     34 
     35     #shortcut to packageManager
---> 36     import pixiedust.packageManager as packageManager
     37     printAllPackages=packageManager.printAllPackages
     38     installPackage=packageManager.installPackage

/Users/mbrobergus.ibm.com/Documents/pixiedust/pixiedust/packageManager/__init__.py in <module>()
     15 # -------------------------------------------------------------------------------
     16 
---> 17 from .packageManager import PackageManager
     18 
     19 #shortcut to packageManager

/Users/mbrobergus.ibm.com/Documents/pixiedust/pixiedust/packageManager/packageManager.py in <module>()
     23 from pixiedust.utils.storage import *
     24 from pixiedust.utils.printEx import *
---> 25 from pyspark import SparkContext
     26 from .package import Package
     27 from .downloader import Downloader, RequestException

ImportError: No module named pyspark

Once you see the success message output from running import pixiedust, you're all set.

Behold, display()

In the next cell, build a very simple dataset and store it in a variable.



In [3]:

    
# Run this cell to
# a) build a SQL context for a Spark dataframe 
sqlContext=SQLContext(sc) 
# b) create Spark dataframe, and assign it to a variable
df = sqlContext.createDataFrame(
[("Green", 75),
 ("Blue", 25)],
["Colors","%"])









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-e7413d831fe3> in <module>()
      1 # Run this cell to
      2 # a) build a SQL context for a Spark dataframe
----> 3 sqlContext=SQLContext(sc)
      4 # b) create Spark dataframe, and assign it to a variable
      5 df = sqlContext.createDataFrame(

NameError: name 'SQLContext' is not defined

The data in the variable we just created is ready to be displayed, without any code other than the call to display().



In [3]:

    
# Run this cell to display the dataframe above as a pie chart
display(df)









    




Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
        
            Colors in this pie chart, by %

After running the cell above, you should have seen a Spark dataframe displayed as a pie chart, along with some controls to tweak the display. All that came from passing the dataframe variable to display().

In the next cell, we'll pass more interesting data to display(), which will also offer more advanced controls.



In [4]:

    
# create another dataframe, in a new variable
df2 = sqlContext.createDataFrame(
[(2010, 'Camping Equipment', 3),
 (2010, 'Golf Equipment', 1),
 (2010, 'Mountaineering Equipment', 1),
 (2010, 'Outdoor Protection', 2),
 (2010, 'Personal Accessories', 2),
 (2011, 'Camping Equipment', 4),
 (2011, 'Golf Equipment', 5),
 (2011, 'Mountaineering Equipment',2),
 (2011, 'Outdoor Protection', 4),
 (2011, 'Personal Accessories', 2),
 (2012, 'Camping Equipment', 5),
 (2012, 'Golf Equipment', 5),
 (2012, 'Mountaineering Equipment', 3),
 (2012, 'Outdoor Protection', 5),
 (2012, 'Personal Accessories', 3),
 (2013, 'Camping Equipment', 8),
 (2013, 'Golf Equipment', 5),
 (2013, 'Mountaineering Equipment', 3),
 (2013, 'Outdoor Protection', 8),
 (2013, 'Personal Accessories', 4)],
["year","category","unique_customers"])

# This time, we've combined the dataframe and display() call in the same cell
# Run this cell 
display(df2)









    




Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
        
            Customers by Category clustered by Year

display() controls

Renderers

This chart like the first one is rendered by matplotlib. With PixieDust, you have other options. To toggle between renderers, use the Renderers control at top right of the display output:

Bokeh is interactive; play with the controls along the top of the chart, e.g., zoom, save
Matplotlib is static; you can save the image as a PNG

Chart options

Chart types: At top left, you should see an option to display the dataframe as a table. You should also see a dropdown menu with other chart options, including bar charts, pie charts, scatter plots, and so on.
Options: Click the Options button to explore other display configurations; e.g., clustering

To know more : https://pixiedust.github.io/pixiedust/displayapi.html

Loading External Data

So far, we've worked with data hard-coded into our notebook. Now, let's load external data (CSV) from an addressable URL.



In [5]:

    
# load a CSV with pixiedust.sampledata()
df3 = pixiedust.sampleData("https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv")
display(df3)









    




Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
        
            Distribution of MPG per Horsepower

You should see a scatterplot above, rendered again by matplotlib. Look at the Renderer menu at top right. You should see options for Bokeh and now, Seaborn. If you don't see Seaborn, it's not installed on your system. No problem, just install it by running the next cell.



In [28]:

    
# To install Seaborn, uncomment the next line, and then run this cell
#!pip install --user seaborn

If you installed Seaborn, you'll need to also restart your notebook kernel, and run the cell to import pixiedust again. Find Restart in the Kernel menu above.